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We study the MIMO broadcast channel and compare the achievable throughput for the optimal 



> 

© : strategy of dllty paper coding to that achieved Wlth sub - optimal and lower complexity linear precoding 

(e.g., zero-forcing and block diagonalization) transmission. Both strategies utilize all available spatial 



o 

asymptotically high SNR, and it is seen that this asymptotic statistic provides an accurate characterization 

O 

at even moderate SNR levels. Furthermore, the difference is not affected by asymmetric channel behavior 



dimensions and therefore have the same multiplexing gain, but an absolute difference in terms of 
throughput does exist. The sum rate difference between the two strategies is analytically computed at 



when each user a has different average SNR. Weighted sum rate maximization is also considered, and a 



similar quantification of the throughput difference between the two strategies is performed. In the process, 
it is shown that allocating user powers in direct proportion to user weights asymptotically maximizes 
weighted sum rate. For multiple antenna users, uniform power allocation across the receive antennas is 
applied after distributing power proportional to the user weight. 

Index Terms 



Multiple antenna or MIMO, broadcast channel, high SNR analysis, zero-forcing, block diagonaliza- 
tion, weighted sum rate. 



2 



I. Introduction 

The multiple antenna broadcast channel (BC) has recently been the subject of tremendous interest, 
primarily due to the realization that such a channel can provide MIMO spatial multiplexing benefits 
without requiring multiple antenna elements at the mobile devices [1]. Indeed, it is now well known 
that dirty paper coding (DPC) achieves the capacity region of the multiple antenna BC [2]. However, 
implementation of DPC requires significant additional complexity at both transmitter and receiver, and 
the problem of finding practical dirty paper codes that approach the capacity limit is still unsolved. 

On the other hand, linear precoding is a low complexity but sub-optimal transmission technique (with 
complexity roughly equivalent to point-to-point MIMO) that is able to transmit the same number of data 
streams as a DPC-based system. Linear precoding therefore achieves the same multiplexing gain (which 
characterizes the slope of the capacity vs. SNR) curve) as DPC, but incurs an absolute rate/power offset 
relative to DPC. The contribution of this work is the quantification of this rate/power offset. 

In this work, we apply the high SNR affine approximation [3] to the sum rate capacity (DPC) and 
to the linear precoding sum rate. Both approximations have the same slope (i.e., multiplexing gain), but 
by characterizing the difference in the additive terms the rate/power offset between the two strategies is 
determined. By averaging the per-channel realization rate offset over the iid Rayleigh fading distribution 
we are able to derive very simple expressions for the average rate offset as a function of only the number 
of transmit and receive antennas and users for systems in which the aggregate number of receive antennas 
is no larger than the number of transmit antennas. 

Note that previous work has analyzed the ratio between the sum rate capacity and the linear precoding 
sum rate [4] [5]. In this work we alternatively study the absolute difference between these quantities, which 
appears to be a more meaningful metric precisely because both strategies provide the same multiplexing 
gain. 

In addition to sum rate, we also study weighted sum rate maximization (using DPC and linear 
precoding) and provide simple expressions for the rate offsets in this scenario. One of the most interesting 
results is that weighted sum rate (for either DPC or linear precoding) is maximized at asymptotically high 
SNR by allocating power directly proportional to user weights. A similar result was recently observed 
in [6] in the context of parallel single-user channels (e.g., for OFDMA systems). Because the linear 
precoding strategies we study result in parallel channels, the result of [6] shows that it is asymptotically 
optimal to allocate power in direct proportion to user weights whenever linear precoding is used. By 
showing that weighted sum rate maximization when DPC is employed can also be simplified to power 
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allocation over parallel channels, we are able to show that the same strategy is also asymptotically optimal 
for DPC. To illustrate the utility of this simple yet asymptotically optimal power allocation policy, we 
apply it to a system employing queue-based scheduling (at finite SNR's) and see that it performs extremely 
close to the true optimal weighted sum rate maximization. 

This paper is organized as follows: Section II presents the system model and Section III introduces the 
high SNR capacity approximation from [3]. Section IV describes dirty paper coding and linear precoding 
and derives simple expressions for their sum rates at high SNR, and in Section V the relative rate/power 
offsets between DPC and linear precoding are computed. Section VI extends the analysis to weighted 
sum rate maximization and considers a queue-based system with the weighted sum rate solution. We 
conclude in Section VII. 

II. System Model 

We consider a if-user Gaussian MIMO BC in which the transmitter has M antennas and each receiver 
has N antennas with M > KN, i.e., the number of transmit antennas is no smaller than the aggregate 
number of receive antennas. The received signal y k of user k is given by 

y fe = H fe x + n fc , k = l,---,K, (1) 

where Hfc(€ C NxM ) is the channel gain matrix for user k, x is the transmit signal vector having a 
power constraint tr(i?[xx H ]) < P, and (k = 1, ■ ■ ■ , K) is complex Gaussian noise with unit variance 
per vector component (i.e., E^n^n^] = I). We assume that the transmitter has perfect knowledge of all 
channel matrices and each receiver has perfect knowledge of its own channel matrix. For the sake of 
notation, the concatenation of the channels is denoted by = [H^ • • • H^](g t^MxKN^ w hich 
can be decomposed into row vectors as = [h^ h^ 2 • • • h^ 2 ■ ■ ■ h^Ar ' ' ' at! > where 

hfc, n (£ C lxM ) is the nth row of We develop rate offset expressions on a per realization basis as well 
as averaged over the standard iid Rayleigh fading distribution, where the entries of H are iid complex 
Gaussian with unit variance. 

Notations: Boldface letters denote matrix-vector quantities. The operations tr(-) and (-) H represents 
the trace and the Hermitian transpose of a matrix, respectively. The operations | • | and || • || denote the 
determinant of a matrix and the Euclidean norm of a vector, respectively. The operations £[■] and j(-) 
denote the expectation and the mutual information, respectively. 
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III. High SNR Approximation 

This section describes the key analytical tool used in the paper, namely the affine approximation to 
capacity at high SNR developed by Shamai and Verdti [3]. At high SNR, the channel capacity C(P) is 
well approximated by an affine function of SNR (P): 

C(P) = SooOogaP-A^+otl) 

= Mi§-^) + °^ (2) 

where Soo represents the multiplexing gain, Cqo represents the power offset (in 3 dB units), and the o(l) 
term vanishes as P — > oo. The multiplexing gain and the power offset are defined as: 

Soo = lim (3) 

P->oo log 2 (P) 

-Coo = Hm f log 2 (P) - . (4) 

This high SNR approximation is studied for point-to-point MIMO channels in [7]. In this context 
the multiplexing gain Sqo is well known to equal the minimum of the number of transmit and receive 
antennas, and thus is essentially independent of the fading environment. However, the rate offset does 
depend on the actual fading statistics (and possibly on the level of channel state information available 
to the transmitter as well), and [7] provides exact characterizations of these offset terms for the most 
common fading models, such as iid Rayleigh fading, spatially correlated fading, and Ricean (line-of-sight) 
fading. Indeed, one of the key insights of [7] is the necessity to consider these rate offset terms, because 
considering only the multiplexing gain can lead to rather erroneous conclusions, e.g., spatial correlation 
does not affect MIMO systems at high SNR. 

In a similar vein, in this work we utilize the high SNR approximation to quantify the difference 
between optimal dirty paper coding and simpler linear precoding in an iid Rayleigh fading environment. 
The multiplexing gain is easily seen to be the same for both strategies, but a non-negligible difference 
exists between the rate offsets. By investigating the differential offsets between these two strategies, we 
are able to very precisely quantify the throughput degradation that results from using linear precoding 
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rather than the optimal DPC strategy in spatially white fading^. 

Although the high SNR approximation is exact only at asymptotically high SNR, it is seen to provide 
very accurate results for a wide range of SNR values, e.g., on the order of 5 dB and higher. Because 
multi-user MIMO systems generally provide a substantial advantage over point-to-point systems (e.g., 
TDMA-based systems) at moderate and high SNR's, the approximation is accurate in the range of interest. 

IV. Dirty Paper Coding vs. Linear Precoding 

In this section we compute the affine approximation to the dirty paper coding sum rate and the linear 
precoding sum rate using the high SNR approximation. 

A. Dirty Paper Coding 

Dirty paper coding (DPC) is a pre-coding technique that allows for pre-cancellation of interference at 
the transmitter. Costa introduced DPC while considering an AWGN channel with additive interference 
known non-causally at the transmitter but not at the receiver [8]. DPC was applied to the MIMO broadcast 
channel, where it can be used to pre-cancel multi-user interference, by Caire and Shamai and was shown 
to achieve the sum capacity of the 2-user, M > 1, N = 1 MIMO broadcast channel [1]. The optimality 
of DPC was later extended to the sum capacity of the MIMO broadcast channel with an arbitrary number 
of users and antennas [9] [10] [11], and more recently has been extended to the full capacity region [2]. 

We now describe the transmit signal when DPC is utilized. Let Sfc(e C Nxl ) be the iV-dimensional 
vector of data symbols intended for user k and Vfc(e C ) be its precoding matrix. Then the transmit 
signal vector x can (roughly) be represented as 

x = V lSl (V 2 s 2 © • • • {Y K -2^K-2 © (V^_is A -_i © V K s K )) ■■■), (5) 

where © represents the non-linear dirty paper sum. Here we have assumed, without loss of generality, 
that the encoding process is performed in descending numerical order. Dirty-paper decoding at the fc-th 

'Although we do not pursue this avenue in the present publication, it would also be interesting to investigate the DPC-linear 
precoding offset in other fading environments, e.g., Ricean and spatially correlated fading. However, one must be careful with 
respect to channel models because some point-to-point MIMO models do not necessarily extend well to the MIMO broadcast 
channel. For example, in point-to-point channels spatial correlation captures the effect of sparse scattering at the transmitter 
and/or receiver and is a function of the angle-of-arrival. In a broadcast channel, the angle-of-arrival is typically different for 
every receiver because they generally are not physically co-located; as a result, using the same correlation matrix for all receivers 
is not well motivated in this context. 
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receiver results in cancellation of Vfe+iS^+i, . . . , Vr-s^, and thus the effective received signal at user k 
is: 

fc-i 

y k = H fc V fc s fc + ^2 H fc v i s i + n k, (6) 

i=i 

where the second term is the multi-user interference that is not cancelled by DPC. If the s& are chosen 
Gaussian, user k's rate is given by: 



log 2 



I + H fc 






I + H fc 







(V) 



where Sj = V ' jE[sjS^]Vj denotes the transmit covariance matrix of user j. Since DPC is optimal, the 
sum capacity of the MIMO BC can be expressed as: 



C DPC (H,P) 



K 

max > Ioe 

E fc tr(S fc )<P^ 



I + H fc 






I + H fc 







(8) 



'if- 



where the maximization is performed over the transmit covariance matrices Si, S2, • ■ 

The duality of the MIMO broadcast channel and the MIMO multiple access channel (MAC) allows 
the sum capacity to alternatively be written as [9]: 



Cdpc(H,P) 



max log? 

E fc tr(Q fc )<P 



K 



I + ^HfQ fe H* 



k=l 



(9) 



where represent the N x N transmit covariance matrices in the dual MAC. 

No closed-form solution to either ([8]) or to ((9]) (which is a convex problem) is known to exist, but 
it has been shown that Cdpc(H,P) converges (absolutely) to the capacity of the point-to-point MIMO 
channel with transfer matrix H whenever M > KN: 

Theorem 1 (Theorem 3 in [1]): When M > KN and H has full row rank, 



lim 



1 + 



p 



KN 



0. 



(10) 



C DPC (H,P)-log 2 

We are now able to make a few important observations regarding the optimal covariance matrices at 
high SNR. Since 



log 2 



K 
k=l 



P 

KN 



log 2 



I+-^H"H 

KN 



(11) 



p I in (O achieves sum capacity at 



choosing each of the dual MAC covariance matrices as — 
asymptotically high SNR. Thus, uniform power allocation across the KN antennas in the dual MAC 
is asymptotically optimal. It is also possible to determine the optimal form of the downlink covariance 
matrices Si, . . . , S^, or equivalently of the downlink precoding matrices Vi, . . . ,~Vk- When JV = 1, 
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Theorem 3 of [1] shows that a technique referred to as zero-forcing DPC asymptotically achieves sum 
capacity. Zero-forcing DPC, which is implemented via the QR-decomposition of the concatenated channel 
matrix H in [1], implies that the precoding matrices Vi, . . . , Vjf are chosen to completely eliminate 
multi-user interference, and thus to satisfy H^Vj = for all j < k. Because DPC eliminates some of 
the multi-user interference terms, Vi has no constraint on it, V2 must be orthogonal to Hi, V3 must 
be orthogonal to Hi and H2, and so forth. If multi-user interference is eliminated, then the system 
decouples into K parallel channels, and simply using equal power allocation across all of the channels 
is asymptotically optimal due to the well known fact that waterfilling over parallel channels provides no 
advantage at high SNR. 

As a result of Theorem [T] an affine approximation for the sum rate can be found as: 

C DPC (H,P) 9* KN\og 2 P - KN\og 2 KN + \og 2 \UU H \ , (12) 

where = refers to equivalence in the limit (i.e., the difference between both sides converges to zero as 
P — > 00). Since the MIMO broadcast and the M x KN point-to-point MIMO channel are equivalent 
at high SNR, the high SNR results developed in [7] directly apply to the sum capacity of the MIMO 
broadcast channel. It is important to be careful regarding the ordering of the equivalent point-to-point 
MIMO channel: due to the assumption that M > KN, the MIMO broadcast is equivalent to the M x KN 
MIMO channel with CSI at the transmitter, which is equivalent to the KN x M MIMO channel with 
or without CSI at the transmitter. When M > KN, the level of CSI at the transmitter affects the rate 
offset of the M x KN point-to-point MIMO channel. Finally, notice that the high SNR sum rate capacity 
only depends on the product of K and N and not on their specific values; this is not the case for linear 
precoding. 

B. Linear Precoding 

Linear precoding is a low-complexity, albeit sub-optimal, alternative to DPC. When linear precoding 
is used, the transmit signal vector x is a linear function of the symbols s&(£ C Nxl ), k = 1, • • • , K: 

K 

x = ^V fe s fc , (13) 

k=l 

where Vfc(e C ) is the precoding matrix for user k. This expression illustrates linear precoding's 
complexity advantage: if DPC is used, the transmit signal is formed by performing dirty-paper sums, which 
are complex non-linear operations, whereas linear precoding requires only standard linear operations. The 
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resulting received signal for user k is given by 

y fe = H fc V fc s fc + ^2 HfcVjSj + n fc , (14) 

where the second term in (fl4l represents the multi-user interference. If single-user detection and Gaussian 
signalling are used, the achievable rate of user k is: 



Rk =l(s k ;y k ) = log 2 



I + H fc 






I + H fc 




Hf 



(15) 



Since DPC is not used, each user is subject to multi-user interference from every other user's signal. As 
a result, the precoding matrices must satisfy very stringent conditions in order to eliminate multi-user 
interference. Note that eliminating multi-user interference is desirable at high SNR in order to prevent 
interference-limited behavior. 

In this paper we consider two linear precoding schemes that eliminate multi-user interference when 
M > KN: zero-forcing (ZF) and block diagonalization (BD). The precoding matrices {Vj}f = i for BD 
are chosen such that for all k) E [1, K], 

H k Vj = O, (16) 

while those for ZF are chosen so that 

hfc,^ = 0, Vj(^ k) G [1, K], Vn, I G [1, N], (17) 

h fc , n v fci/ = 0, Vl&n)€[l,N], (18) 

where Vji denotes the Ith column vector of Vj. Consequently, performing ZF in a system with K users 
with iV(> 1) antennas is equivalent to performing ZF in a channel with KN single antenna receivers. 
Note that H having full row rank is sufficient to ensure ZF and BD precoding matrices exist. In iid 
Rayleigh fading H has full row rank with probability one. 

1 ) Zero-forcing: When ZF is employed, there is no multi-user or inter-antenna interference. Then the 
received signal at the nth antenna of user k is given by 

Vk,n = h k ,nVk, n Sk,n + Hfe.n, n = 1, • • • , N, (19) 

where Sk >n and n k , n denote nth component of and n^, respectively. Thus, ZF converts the system 
into KN parallel channels with effective channel g k) n = hfc n Vfc n . Sum rate is maximized by optimizing 
power allocation across these parallel channels: 

K N 

C ZF (H,P)= max V V log 2 (l + P fe ,„|g fc ,„| 2 ) . (20) 
EfcE " Pfc "- P fc=m=i 
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Since the optimum power allocation policy converges to uniform power at asymptotically high SNR [12], 
we have: 

K N 

C ZF (H, P) KN \og 2 P - KN \og 2 KN + log 2 [] \[ \g Kn \ 2 . (21) 

fc=ln=l 

This approximation is identical to that for DPC in (ITZb except for the final constant term. 

2) Block Diagonalization: When BD is employed, there is no multi-user interference because the 
precoding matrix for BD is chosen to be H^Vj = O for k ^ j. Then the received signal for user k is 
given by 

y fc = H fc V fc s fc + n fc . (22) 

Thus, BD converts the system into K parallel MIMO channels with effective channel matrices = 
H fc V fe , k = !,■■■ , K. The BD sum rate is given by [13][14] 

K 

Cbd(H,P)= max £ log 2 |l + G fc Q fc Gf | , (23) 

Qfc:£>{Q fc }<P fc=1 

fc=i 

and the optimal rate is achieved asymptotically by uniform power allocation at high SNR since the 
channel can be decomposed into parallel channels. Hence, the sum rate is asymptotically given by 

K 

C BD (H,P) ^ KNlog 2 P - KNlog 2 KN + log 2 ]J\G^G k \. (24) 

k=i 

C. Equivalent MIMO Interpretation 

Due to the properties of iid Rayleigh fading, systems employing either zero-forcing or block diagonal- 
ization are equivalent to parallel point-to-point MIMO channels, as shown in [13]. When ZF is used, the 
precoding vector for each receive antenna (i.e., each row of the concatenated channel matrix H) must be 
chosen orthogonal to the other KN — 1 rows of H. Due to the isotropic nature of iid Rayleigh fading, 
this orthogonality constraint consumes KN — 1 degrees of freedom at the transmitter, and reduces the 
channel from the 1 x M vector hfc ra to a 1 x (M — KN + 1) Gaussian vector. As a result, the effective 
channel norm \gk, n \ 2 of each parallel channel is chi-squared with 2(M — KN + 1) degrees of freedom 
(denoted X 2 /m-kn+i)^ 4 Therefore, a ZF-based system with uniform power loading is exactly equivalent 
(in terms of ergodic throughput) to KN parallel (M - KN + 1) x 1 MIMO channels (with CSIT). 

When BD is used, the orthogonality constraint consumes (K — l)N degrees of freedom. This reduces 
the channel matrix which is originally N x M, to a N x (M — (K — 1)N) complex Gaussian matrix. 
As a result, the N x N matrix G^Gfc is Wishart with M — (K — l)N degrees of freedom, and therefore 
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TABLE I 

Sum rates at high SNR and their equivalent MIMO interpretation 





C(H,P) 


MIMO Interpretation 


DPC 


log|^H*H| 


one M x KN 


BD 


Efc=l l0 § \^f G k G k\ 


K parallel (M - (K - 1)N) x N 


ZF 


EfLi Ef=i lo s iim\9k,n\ 2 ) 


KN parallel (M - KN + 1) x 1 



a BD-based system is equivalent to if parallel (M — (if — l)iV) x iV parallel MIMO channels (with 
CSIT). 

Finally, when DPC is used, the MIMO broadcast channel is equivalent to the M x KN point-to- 
point MIMO channel, where M > KN and CSIT is again assumed. Note that a MIMO channel of this 
dimension can be interpreted as a series of parallel channels as well: in this case, the M x KN channel 
is equivalent to M x 1, M - 1 X 1, . . . , M - KN + 1x1 channels in parallel [15]. 

For all three cases, the MIMO equivalence is exact when uniform power loading is used with ZF 

(Pk,n = ttn for a11 fc > n in C§9). BD (Q* = ^v 1 for a11 k in ©). and DPC ( Qk = for all k in 
©). If optimal power allocation is performed, for either ZF, BD, or DPC, the MIMO broadcast systems 
can achieve a larger ergodic throughput than the MIMO equivalent at finite SNR. However, because 
waterfilling provides a vanishing benefit as SNR is increased, this advantage disappears at asymptotically 
high SNR. 

The equivalent MIMO channels are summarized in Table U and illustrated in Fig. [Qfor M = 7, N = 2, 
K = 3. In this case ZF is equvalent to 6 parallel 2x1 channels, BD is equivalent to 3 parallel 3x2 
channels, and DPC is equivalent to a 7x6 channel. The absolute difference in throughput at asymptotically 
high SNR is indeed due to the diference in the degrees of freedom in the available parallel channels, as 
made precise in the following section. 

Our analysis is limited to channels in which M > KN. If M < KN, i.e., there are strictly less 
transmit antennas than aggregate receive antennas, then no MIMO equivalent channel exists for either 
DPC or linear precoding. The sum capacity (DPC) is smaller than the capacity of the M x KN (forward) 
cooperative channel (in which CSIT is not required at high SNR), but is larger than the capacity of the 
reverse KN x M cooperative channel without CSIT. Zero forcing and block diagonalization are clearly 
only feasible when the number of data streams is no greater than M. Thus, if there are more than 
M receive antennas, some form of selection (of users and possibly of the number of data streams per 
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Parallel MISO 



DPC 



MIMO 



BD 



Parallel MIMO 



_A7 



_A7 



(a) 




ZF 



c 




c 




c 




c 




c 




c 


3 



(b) 



(c) 



Fig. 1. The broadcast channel with M = 7, N — 2, and K — 3 can be interpreted in terms of its sum rate as (a) 7 x 6 
point-to-point MIMO channel when DPC is employed, (b) 3 parallel 3x2 MIMO channels when BD is employed, and (c) 6 
parallel 2x1 MISO channels when ZF is employed. 



receiver) must be performed. As a result of these complications, it does not appear that the high SNR 
framework will yield closed-form solutions for either DPC or linear precoding when M < KN. 

V. Sum Rate Analysis 

This section quantifies the sum rate degradation incurred by linear precoding relative to DPC. In terms 
of the high SNR approximation, this rate offset is essentially equal to the difference between the 
terms for DPC and linear precoding. 

A. DPC vs. ZF 

We define the rate loss as the asymptotic (in SNR) difference between the sum rate capacity and the 
zero forcing sum rate: 

/3 DPC -zf(H) 4 lim [C DPC (H, P) - C ZF (U, P)\ . (25) 

P^oo 

Since each of the capacity curves has a slope of in units of bps/Hz/dB, this rate offset (i.e., the 
vertical offset between capacity vs. SNR curves) can be immediately translated into a power offset (i.e., 
a horizontal offset): A DP c-zf(H) = -^/3 DP c-zf(H) dB. Because A DP c-zf is in dB units, we clearly have 
A DPC -zf(H) = 3(££(H) - £° PC (H)), which implies 

t^L = + 3 A DPC -zf (26) 

= ^S. + ^y^dpc-zf (27) 
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From the affine approximation to DPC and ZF sum rate found in (1121) and (1211) . the rate loss incurred 
by ZF is: 



/?dpc-zf(H) = log 2 



|H H H| 



(28) 



t-iK t-iN I |2 ' 
llfc=l lln=l \9k,n\ 

While the above metric is the rate loss per realization, we are more interested in the average rate offset 
across the fading distribution: 



/5dpc-zf — £h [/5dpc-zf(H)] , 



(29) 



which allows a comparison of ergodic (over the fading distribution) throughput. Likewise, the average 
power offset is denoted as A DP c-zf and can be immediately calculated in the same fashion. Under iid 
Rayleigh fading, the matrix HH fl is Wishart with 2M degrees of freedom while |g&:,n| 2 are identically 

XztM-KN+l)' as ex P ia i ne d in Section ITV-Ci 

The key to computing the average offset is the following closed form expression for the expectation 
of the log determinant of a Wishart matrix: 

Lemma 1 (Theorem 2.11 of [16]): Ifmxm matrix HH fl is complex Wishart distributed with n 
(> m) degrees of freedom (d.o.f), then: 

m— 1 



£ 



lo, 



g e |HH H |] = J^(n-J), 



(30) 



1=0 



where tp(-) is Euler's digamma function, which satisifes 



m— 1 



i/)(m) = V(l) + Yl 7 
i=i 1 



(3D 



for positive integers m and ^(1) ~ —0.577215. 

This result can be directly applied to chi-squared random variables by noting that a 1 x 1 complex Wishart 
matrix with n degrees of freedom is x|n : 

fpog e xL]=^W- ( 32 ) 

Using Lemma Q] we can compute the average rate offset in closed form: 
Theorem 2: The expected loss in Rayleigh fading due to zero-forcing is given by 



KN-l 



J 



0DK-MM,KN) = log 2 e V . 

L — ' M — 1 

j=l J 



(bps/Hz). 



(33) 
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log 2 e 



(35) 



Proof: Since HH fl is KN x KN Wishart with M d.o.f, and \gk,n\ 2 is X^fM-KN+lY Lemma 1 
applied to (l29l gives: 

/3dpc-zf = £ [log e |H H H|] - KiV • £ [log e | 5 i,i| 2 ] (34) 

^ V>(M - ) - KNip{M - KN + 1) 
v 1=0 J 

By expanding the digamma function and performing the algebraic manipulations shown in Appendix H 
the form d33l can be reached. ■ 
Using this result we easily get an expression for the rate offset C^(M,KN) by plugging into d2"7T ) 

CZ{M,KN) = C^ c {M,KN) + ^p mc . Z¥ {M,KN) (36) 

, KN-1 

= C™ MO (KN,M) + ^ £ -J— (37) 

3=1 

where C^ MO (KN, M) is the rate offset of a KN transmit antenna, M receive antenna MIMO channel 
in iid Rayleigh fading, which is defined in Proposition 1 of [7]. 

When the total number of receive antennas is equal to M (i.e., M = KN), ZF incurs a rather large 
loss relative to DPC that can be approximated as: 

A>pc-zf(M) «Mlog 2 M (bps/Hz) (38) 

in the sense that the ratio of both sides converges to one as M grows large (see Appendix [EI] for the 
proof). In this scenario, the ZF sum rate is associated with the capacity of M parallel lxl (SISO) 
channels while the DPC sum rate is associated with a M x M MIMO channel. This corresponds to a 
power offset of 3 log 2 M (dB), which is very significant when M is large. Note that the approximation 
31og 2 M (dB) overstates the power penalty by 1 to 1.5 dB for reasonable values of M(< 20), but does 
capture the growth rate. Such a large penalty is not surprising, since the use of zero-forcing requires 
inverting the M x M matrix H, which is poorly conditioned with high probability when M is large. 

We can also consider the asymptotic ZF penalty when the number of transmit antennas is much larger 
than the number of receive antennas. If the number of users and transmit antennas are taken to infinity 
at a fixed ratio according to M = aKN with KN — > oo for some a > 1, then the power offset between 
DPC and ZF converges to a constant: 

Theorem 3: For M = aKN with a > 1, KN — > oo, the asymptotic power penalty for ZF is given 

by 

Adpc-zfM = -3 nog 2 e + a log 2 M - ^ J J (dB). (39) 
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Proof: See Appendix Hill ■ 
This power offset is incurred due to the fact that the DPC sum rate increases according to a KN x aKN 
MIMO channel capacity while the ZF sum rate increases according to KN parallel (a — 1)KN x 1 
MISO channels. For example, if a = 2, or the number of transmit antennas is double the number of 
receivers, the zero-forcing penalty is no larger than 1.67 dB, and monotonic convergence to this asymptote 
is observed. Thus for large systems, ZF is a viable low-complexity alternative to DPC if the number of 
transmit antennas can be made suitably large. A similar conclusion was drawn in [17] where the ratio of 
the rates achievable with ZF relative to the sum capacity is studied. Note that using ZF on the MIMO 
downlink channel is identical to using a decorrelating receiver on the multiple antenna uplink channel or 
in a randomly spread CDMA system; as a result Theorem [3] is identical to the asymptotic performance 
of the decorrelating CDMA receiver given in Eq. (152) of [3]. 

Figure |2]plots the ZF and DPC throughputs for two five receiver systems. In a five-transmit-antenna/five- 
receiver system (M = K = 5, N = 1), Theorem [2] gives a throughput penalty of 9.26 bps/Hz, which is 
equivalent to a power penalty of 5.55 dB (whereas the approximation in d38l ) gives 6.97 dB). Although 
this penalty is exact only in the asymptote, the figure shows that it gives accurate results throughout the 
entire SNR range. Throughput curves for a (M = 10, K = 5, N = 1) system are also shown. The ZF 
power penalty is only 1.26 dB, which is reasonably close to the asymptotic penalty of 1.67 dB given by 
Theorem [3] for a = 2. Increasing the number of transmit antennas from 5 to 10 shifts the sum capacity 
curve by 5.59 dB, but improves the performance of ZF by 9.88 dB. This is because ZF gains the increase 
in the term of sum capacity, along with the significantly decreased ZF penalty due to the increased 
number of transmit antennas (5.55 dB to 1.26 dB). Thus adding transmit antennas has the dual benefit 
of increasing the performance of DPC as well as reducing the penalty of using low-complexity ZF. 

B. DPC vs. BD 

We similarly define the rate loss between DPC and BD as: 

/3dpc-bd(H) 4 lim [C DPC (H, P) - C BD (H, P)\ , (40) 

and denote the expected loss as /3dpc-bd — £h[A)PC-bd(H)]. Similar to the analysis for ZF, we can 
calculate the loss terms for a fixed channel and also average over Rayleigh fading. In order to compute 
the average rate loss, we use the fact that the BD sum rate is asymptotically equal to the capacity of K 
parallel N x (M - (K - 1)N) iid Rayleigh MIMO channels [13]. 
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Sum Rate vs. SNR 




o 1 1 1 1 1 ' 

O 5 10 15 20 25 30 

SNR (dB) 



Fig. 2. DPC vs. zero-forcing at high SNR 



Theorem 4: The expected loss in Rayleigh fading due to block diagonalization is given by 



K -1N-1(K-1)N 

i DPC _ Br) ( A/. /v. y) - (lo R ,< ) | E E 

fc=0 n=0 i=fciV+l 



1 



M -n-i 



(bps/Hz). 



(41) 



Proof: See Appendix II VI 

Eq. ((4T|) simplifies to (l33l) when iV = 1; i.e., zero-forcing is a special case of block diagonalization. If 
the number of transmit antennas M is kept fixed but N is increased and K is decreased such that KN 
is constant, i.e., the number of antennas per receiver is increased but the aggregate number of receive 
antennas is kept constant, then the rate offset decreases. In the degenerate case M = N and K = 1 the 
channel becomes a point-to-point MIMO channel and the offset is indeed zero. Using the same procedure 
as for ZF, we can easily get an expression for the rate offset £™(M, K, N) (|27T ) 



£™(M,K,N) = £^(KN,M) + —P DPC _ BD (M,N,K). 



(42) 



Although it is difficult to obtain insight directly from Theorem HI it is much more useful to consider 
the offset between BD (K receivers with N antennas each) and ZF (equivalent to KN receivers with 1 
antenna each). 

Theorem 5: If M = aKN with N > 1 and a > 1, the expected throughput gain of BD relative to 
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ZF is: 



Pbd-zf = Pbpc.zf(M,NK)-P DP c. vd (M,N,K) 

N-l 



(log 2 e)K ~ {N ^jl , : (bps/Hz ) 



(a - 1)KN + j 



31og 2 e^ (N-j) 



N ^ {a - l)KN + j 



(dB). 



Proof: See Appendix |Vl 
A direct corollary of this is an expression for the expected power offset when M = KN: 




(43) 



Note that this expression only depends on the number of receive antennas per receiver and is independent 
of the number of users, i.e., of the system size. For example, consider two system configurations: (i) 4f 



receivers each have two antennas, and (ii) M receivers each have one antenna. Equation (1431 indicates that 
the power advantage of using BD in the N = 2 system is A B d-zf = 2.1640 (dB) relative to performing 
ZF. Since this offset is independent of M, it is the same for M = 4 and K = A, N = 1 vs. K = 2, N = 2 
systems as well as for M = 6 and K = 6, N = 1 vs. K = 3, N = 2 systems. To illustrate the utility of 
the asymptotic rate offsets, sum rates are plotted in Fig. [3] for systems with M = 12 and N = 3, K = 4, 
and N = 2, K = 6. Notice that the asymptotic offsets provide insight at even moderate SNR levels (e.g., 
10 dB). When M = 12, N = 3, K = 4, /Wzf = 14.4270 (bps/Hz) and A BD -zf = 3.6067 (dB) while 
the numerical values are 14.6 (bps/Hz) and 3.65 (dB), respectively. 



C. Unequal Average SNR' s 

The underlying assumption beforehand is that the strengths of channel gains are the same for all 
users. However, there exist near-far effects in a typical wireless broadcast channel scenario which lead 
to asymmetric channel gains. In this subsection, we consider the effect of asymmetric channel gains or 
unequal average SNR and reformulate the rate offsets d33l and (PTTT ). 

We assume that the channel gain of each user can be decomposed into 

n k = ^n kl k = i,---,K, (44) 

where 7^. denotes the average SNR of user k. The elements of have Gaussian distribution with 
mean zero and unit variance. Notice that the quantities with tilde (•) are derived under a zero mean unit 



17 



200 




10 20 30 40 50 



SNR (dB) 

Fig. 3. Comparison of J41 b and {43} with simulated rate losses and power offsets 



Gaussian assumption. Then the channel model fl} is changed to 

Yk = V7fcH fc x fc + n fc . (45) 

In the preceding discussion, we have used the fact that the uniform power allocation is asymptotically 
optimal for DPC at high SNR. It is important to note that the uniform power allocation is still asymp- 
totically optimal even when users' SNR are asymmetric. Since H = (diag(^/7i, • • • , i/tF) <8> I;vxiv)H 
where = [H^ • • • Hj|] and (g) denotes the Kronecker product, the aggregate channel H is full 
rank with M > KN . Thus, Theorem [T] holds. When ZF or BD is used, the effective channels are simply 
multiplied by the corresponding 

From (0, d20l ), and (|23l ) with uniform power allocation, we can derive the sum rates for asymmetric 
channel gains as follows 

K 

C DPC (H,P) * C DPC (H, P)+N log 2 7fc, (46) 

k=l 
K 

C ZF (H,P) ^ CzF(H,P)+iV^log 2 7 fe , (47) 

k=l 
K 

C BD (H,P) C BD (H,i :, )+iV^log 2 7fc, (48) 

k=l 

where C DP c(H,P), Czf(H, P), and C B d(H,P) are the sum rates under the symmetric channel gain 
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Fig. 4. Average sum rates with the optimal power allocation and the uniform power allocation when M = 4, N = 1, K = 4, 
with unequal SNR: 71 = 0.1, 72 = 0.5, 73 = 1, 74 = 2. 



scenario. The derivation of d46l ), d47l ). and d48l ) can be found at Appendix IVIj As a result, it is easy to 
see that the DPC-ZF and DPC-BD offsets are unaffected by 71, • • • ,jk- 

Theorem 6: The expected loss in Rayleigh fading when each user has a different average SNR is 
identical to the loss when all users have the same average SNR at high SNR. That is, 

KN—l 

P D ?c-zAM,KN) = log 2 e V — 3 — (bps/Hz), (49) 

z — ' M — 1 

3=1 J 

and 

/K-1N-1(K-1)N \ 

PD P cMM,K,N) = (log 2 e) E M-n-i (b P s/Hz) ' (50) 

y fc=0 n=0 i=kN+l J 

which are identical with (|33T ) and (|4TI ). respectively. 

Fig. |4] illustrates that the sum rates by optimal power allocation and uniform power allocation tend 
to zero as power grows both for DPC and ZF. Unlike the symmetric channel gain case, more transmit 
power is required to make the difference sufficiently small. 

VI. Weighted Sum Rate Analysis 

In this section we generalize the rate offset analysis to weighted sum rate. We first consider single 
antenna receivers (N = 1), and then discuss the extension to N > 1 at the end of this section. Fig. [5] 
illustrates the capacity region (DPC) and the ZF achievable region for a particular 2 user channel at 30 
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Fig. 5. Achievable regions of DPC and ZF for M = 2, N = 1, and K = 2, with hi = [1 0.5], h 2 = [0.5 1] at SNR=30 
(dB). 



dB. While the sum rate is the point where the negative slope of the boundary of the rate region is 1, 
weighted sum rate corresponds to points where the slope of the boundary are specified by the particular 
choice of user weights. The sum rate offset quantifies the difference between the sum rate points of both 
regions; the weighted sum rate offset is intended to describe the offset for the other portions of the rate 
region. 

We first show that allocating power in proportion to user weights is asymptotically optimal for either 
DPC or ZF, and then use this result to compute the associated rate offsets. Then, we show the utility of 
our simple power allocation policy via application to queue-based scheduling. 

A. Asymptotically Optimal Power Allocation 

Without loss of generality, we assume user weights are in descending order: jtti > )U2 > • • • > fJ-K > 
with J2k=i Mfc = 1- The maximum weighted sum rate problem (DPC), which is defined as the maximum 
of Ylk=i ^kRk ov e r the capacity region, can be written in terms of the dual MAC as: 

K 

C DPC (^H,P) = max V ^ log 2 (l + P^A^y 1 ^ ) , (51) 

where A^" 1 ) = I + Y$Zi p ^f^j for & > 1 and A^ = I. Since N = 1, each channel is a row 
vector and is written as life. Notice that the uplink decoding is done in order of increasing weight, i.e., 
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user K does not get the benefit of any interference cancellation while user l's signal benefits from full 
interference cancellation and is thus detected in the presence of only noise. 

The following lemma shows that if we limit ourselves to linear power allocation policies of the form 
Pk = oi^P, then the objective function in (|5T| ) can be decoupled at high SNR: 



Lemma 2: If M > K, then for any > 0, k = 1, • • • , K with Ylk=i a k = 1> 



lim 

P^oo 



K 



K 



Y, log 2 (l + a fe Ph ifc (A( fe - 1 ))- 1 hf ) - K log 2 (1 + a fc P||f, 



k=l 



k=l 



0, (52) 



|f fc || 2 as P 



oo. By the 



where f& is the projection of onto the nullspace of {hi, • • • , hfc_i}. 

Proof: Lemma [3] in Appendix IVIII shows that h fc (A( fc_1 )) _1 h^ 
continuity of log(-) and the fact that P — > oo, we get the result. ■ 
Once the weighted sum rate maximization has been decoupled into the problem of maximizing weighted 
sum rate over parallel single-user channels, we can use the result of [6] to show that the optimal power 
allocation is of the form P fc * = [i k P + O(l). 

Theorem 7: When M > K, allocating power according to 



k= ,K 



(53) 

asymptotically achieves the optimal solution to (|5TT > at high SNR. 

Proof: By Lemma |2j the following optimization will yield an asymptotically optimal solution (albeit 
with a weak restriction on allowable power policies): 



K 



max y n k log 2 (l + P\\\% 



(54) 



The optimal power policy for a more general version of this problem, in which there are more than K 
parallel channels and each user can occupy multiple channels, is solved in [6]. We need only consider 
this simplified version, and it is easily checked (via KKT conditions) that the solution to d54b is: 



for k= ,K, (55) 

when P is sufficiently large to allow all users to have non-zero power. Therefore, at high SNR we have 

P fc * =H k P + 0(l), k = l,---,K. (56) 

Since the 0(1) power term leads to a vanishing rate, we have the result. ■ 
Theorem [7] generalizes the fact that uniform power allocation achieves the maximum sum rate asymptot- 
ically at high SNR. That is, for the sum rate problem the weights are the same (i.e., fj,i = ■ ■ ■ = fix = 
1/K), thus the uniform power policy is asymptotically optimal. 
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Fig. 6. Averaged weighted sum rate difference between the exact solution J51b and the asymptotic solution d53t when /ii = 0.6 
and fi2 = 0.4 for Rayleigh fading channel. 



In Fig. [6] the difference between the true weighted sum rate ( [511 and the weighted sum rate achieved 
using Pj. = fi^P is plotted as a function of SNR. This difference is averaged over iid Rayleigh channel 
realizations for a (M = 4, K = 2, N = 1) system with /zi = 0.6 and /i2 = 0.4. The approximate power 
allocation is seen to give a weighted sum rate that is extremely close to the optimum even at very low 
SNR values. 

Meanwhile, the weighted sum rate by ZF is given by 

K 

C zf (m,H,P)= max V p k log 2 (l + P fc ||gfc|| 2 ) , (57) 

where is the projection of onto the null space of {hi, • • • , h^-i, hk+i, ■ • • , hx}- The result of [6] 
directly applies here, and therefore the power allocation policy in ( f53T > is also the asymptotic solution to 

63. 
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Fig. 7. MIMO BC with queues 



B. Rate Loss 

Using the asymptotically optimal power allocation policy of (153T ). the weighted sum rates of DPC and 
ZF can be expressed as 

K 

C DPC (/x,H,P) * ^ fc log(l + /i fc P||f fc || 2 ), (58) 
fc=l 

Czf(/x,H,P) ^ ^ Mfe log(l + /x fc P||g fc || 2 ). (59) 
fc=i 

Thus, the rate offset per realization is given by 



j ^ llf 
/3dpc-zf()u, H) = ^2 fj, k log -p 



* m 112 

k 



(60) 



In Rayleigh fading, the distributions of ||ffc|| 2 and ||gfc|| 2 are xl(Af-fc+i) anc * X-2(M-K+i)' res P ec tively. 
Therefore, the expected rate loss is given by 

K / M-k \ 

^DPC-ZF(M,M,^)^(log 2 e)^/i fc D - ' (61) 

k=l \j=M-K+l ) 

It is straightforward to check that the rate offset is minimized at the sum rate, i.e., when \i\ = ■ ■ ■ = 

Mfc = jc- If we let Ck = Ylf^M-K+l J' then Ci > C2 > • • • > Ck and /3 DPC -zf = (log 2 e) Y$=l MfcCfe- 
Since {^} has constraints of /ii > • • • > X^feLi A*i = 1» an d Mft > (1 < k < K), /?dpc-zf achieves 
minimum at \i\ = ■ ■ ■ = p,j. = for a given {Cfc}- 

C. Application to Queue-based Scheduling 

Queue-based scheduling, introduced by the seminal work of Tassiulas and Ephremides [18], is one 
application in which it is necessary to repeatedly maximize the weighted sum rate for different user 
weights. Fig. [7] illustrates a queue-based scheduling system for two users. Data for the users arrive at 
rates Ai and A2, which are generally assumed to be unknown. During each time slot, the transmitter 
chooses the rate vector that maximizes the weighted sum rate over the instantaneous rate region with 



23 



weights equal to the current queue sizes. If the queue lengths are denoted as qi(t) and q% (t), then the 
transmitter solves the following optimization during each time slot: 

max qi(t)Ri + q2(t)R2, (62) 
ReC(H,P) 

and such a policy stabilizes any rate vector in the ergodic capacity region. 

Although the weighted sum rate maximization problem for DPC stated in equation (f5TT > is convex, 
it still requires considerable complexity and could be difficult to perform on a slot-by-slot basis. An 
alternative is to use the approximate power allocation policy from (I53T ) during each time slot: 

Pk = uTm m p ' (63) 
and where the ordering of the queues determines the dual MAC decoding order (larger queue decoded 
last). 

Although we do not yet have any analytical results on the performance of the asymptotically optimal 
power policy, numerical results indicate that such a policy performs nearly as well as actually maximizing 
weighted sum rate. Ongoing work is investigating whether the approximate strategy is stabilizing for this 
system. 

In Fig. [8] average queue length is plotted versus the sum arrival rate for an M = 4, K = 2 channel at 
10 dB, for both the exact weighted sum rate maximization as well as the approximation. Both schemes 
are seen to perform nearly identical, and the approximate algorithm appears to stabilize the system in 
this scenario, although this is only empirical evidence. 

D. Extension to N > 1 

Similar to ( [511 , the weighted sum rate by DPC can be as: 

K |A( fc )| 

Uk log 2 r 
k=i |j 

where A^ = I + £)J =1 Hf Q^H, for k > 1 and = I. From the construction of A^, 

| A ( fc )| 



Cdpc(/^,H,P)= max > ix k log 2 (64) 
£f =1 tr(Q fc )<Pt^ |A( fc -!)| 



I + Q fe H fe (A( fe - 1 ))- 1 Hf 



|A( fc -!)| 
Hence, d64b can be written as 

K 

C BC (M,H,P)= max V ^ k log 2 I + QfeH^A^ 1 ))- 1 ^ . (65) 

Ef =1 tr(Q fc )<^^ 

With the decoupling lemma (see Appendix I VIII ), the above optimization ( f65T ) can be solved asymptotically 
like the case of N = 1: 
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Fig. 8. Average queue length for symmetric arrival. 



Theorem 8: At high SNR, the optimization in (1641) is asymptotically achieved when 



Q k = ^I, k = l,---,K. 



(66) 



Proof: See Appendix IVIIII 
Similarly, the weighted sum rate of BD is given by 

K 

C BD (v,U,P)= max yV&log 2 |I + Q fe G fc Gf | , (67) 

Q^Ef =1 tt(Q fe )<P^ 

where is the projection of onto the null space of {Hi, • • • , H^_i, Hjc+li ' ' " j Hr-}. Likewise, 
the optimization (1571) is the same as the optimization (1761 ) and (1771 ) except that is replaced by G& 
which does not contribute to the asymptotic solution. Thus, the power allocation policy in (l66l ) is also 
the asymptotic solution to (l67l ). 

£ More Users Than Antennas 

Although it is asymptotically optimal to allocate power in proportion to user weights when M > KN, 
this is not the case when M < KN. Indeed, such a strategy can easily be checked to be sub-optimal even 
for a single antenna broadcast channel with more than one user, as considered in [19][20]. Allocating 
power directly proportional to user weights or allocating all power to only the user with the largest weight 
yields, for many single antenna broadcast channels, a weighted sum rate that is a bounded distance away 
from the true optimal weighted sum rate. 
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Fig. 9. Ergodic weighted sum rates by DPC and by approximations when M = 2, N = 1, K = 3, with /^i = 0.5, /i2 = 0.3, 
and fi3 = 0.2. 



Although neither of these strategies is asymptotically optimal, numerical results do show that these 
approximations achieve rates that are extremely close to optimum. In general, there are two different 
reasonable power approximations. The first is to simply choose P^ = [ikP- However, when K > M, this 
results in sending many more data streams than there are spatial dimensions, which is not particularly 
intuitive. An alternative strategy is to allocate power to the users with the M largest weights, but again 
in proportion to their weights. 

Fig. [9] illustrates the ergodic weighted sum rates vs SNR forai^ = 3, M = 2,N = 1 system in 
which m = 0.5, H2 = 0.3, and /X3 = 0.2, averaged over Rayleigh fading. The true weighted sum rate is 
compared to the first strategy, where = fikP, and to the second strategy, where only users 1 and 2 
are allocated power according to: Pi = 7^7^ P, P2 = ~f+jr 2 P> an d P3 = 0- Both approximations are a 
non-zero distance away from the optimum, but the rate loss is seen to be extremely small. 

VII. Conclusion 

We have investigated the difference between the throughputs achieved by dirty paper coding (DPC) 
relative to those achieved with linear precoding strategies by utilizing the affine approximation to high 
SNR and computing the exact throughput/power offsets at asymptotically high SNR for MIMO broadcast 
channels in which the number of transmit antennas is no smaller than the total number of receive antennas. 
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Simple expressions in terms of the number of transmit and receive antennas are provided for the average 
rate/power offset in a spatially white Rayleigh fading environment. When the aggregate number of receive 
antennas is equal or slightly less than than the number of transmit antennas, linear precoding incurs a 
rather significant penalty relative to DPC, but this penalty is much smaller when the number of transmit 
antennas is large relative to the number of receive antennas. 

Furthermore, we generalized our analysis to weighted sum rate and quantified the asymptotic rate/power 
offsets for this scenario as well. One of the most interesting aspects of this extension is the finding 
that allocating power directly proportional to user weights is asymptotically optimal for DPC at high 
SNR. This result is an extension at a similar result for parallel channels found in [6], and this simple 
yet asymptotically optimal power policy may prove to be useful in other setting such as opportunistic 
scheduling. 
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Appendix I 
Proof of Theorem [2] 

Starting from (|35l ) and utilizing ip(m) = + Y^i=i^ J we have: 

/KN-l \ 






KN-l 



KN 




KN-l 



3 
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Appendix II 



Derivation of ([38]> 




£H[log e /? D PC-ZF(H)] = £ W—=E 



(68) 



i=i i=i »=i 



If we let Sm denote the expected rate loss with M antennas, we have: 



Sm+i ~ S M = V - < 1 + logM, for M > 1 



(69) 



since log e M = J x ^dx > 5^££ 2 I because ^ is a decreasing function. If we let f{M) = M\og e M, 
f(M) = 1 + log e M, which is an increasing function of M, and thus f(M + 1) > /(M) + 1 + log e M. 
Since - 5 A / < 1 + log M and /(l) = 5i = 0, S M < Af log Af for all M > 1. 

Now we show that 5jvf converges to MlogM. We do this by showing that Sm > ^Mlog e M for 
any < < 1 for all M larger than some M . First notice that log e M < YliLi 1 J < Ei^i 7 by the 
definition of the log(-) function. Thus, 



Let g(M) = 6 MlogM for some < < 1. Then g'(M) = # + #log e M, which is an increasing function 
of M. Thus #(M + 1) < g(M) + £f'(M + 1) = g{M) + + 6>log e (M + 1). Therefore we have 



Notice that the term 6 + 9\og{M + 1) — logM is a monotonically decreasing function that goes to 
— 00. Thus, any positive gap between g(M) and Sm must close and go to — 00, i.e., Sm > 9(M) for 




i=l 



(70) 



g(M + 1) - S M +i < (f(M) - S M ) + 6 + 9log(M + 1) - logM. 



(71) 




> 1, or liuiM 



00 M log e M — 



> 6 for 



(72) 



M->oo M log e M 



as desired. 
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Appendix III 
Proof of Theorem [3] 

From Theorem [2l if M = aKN, the expected power offset, which is now a function of a and KN, 
can be expressed as: 

o, KN-l 

A DK Ma,KN) = -^f Y, -J— M = aKN 

j=l J 

KN-l _j_ 
j=l " KN 



Let us define a function f(x) as 



f(x) = —?—, are [0,1], a>l. 
a — x 



Then A DP c-zf can be expressed as 

KN-l 



/ ' \ 1 

A DPC -zF(a,^iV) = 31og 2e fyjfN) 



v KN J KN ■ 
j = l v / 

which is a Riemann sum; i.e., as KN — > oo, 

lim A dpc _zf(0) KN) = 3 log 2 e / f{x)dx = 3 logo e / dx. 

KN-,00 J J a-x 

Thus, 

a — 1 



A D pc-zf(oO = lim A DP c-zf(", -F^V) = 3 log 2 e -1 - a log e 

KN^oo \ a 



-:>> ( log 2 e + alog 2 ( 1- - 



Appendix IV 
Proof of Theorem @] 

From (fT2l ) and (l24l ). /?dpc bd is given by: 

#dpc-bd = £ [log 2 |H H H|] -iff [log 2 |Gf G* 
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where G%G k is Wishart wth M — (K — 1)N de grees of freedom. Applying Lemma Q] and expanding 
the digamma function we have: 



'DPC-BD 



log e 



KN-l 



N-l 



jV-l 



ra=0 



log e ^ ^ [^(M -n-kN)- ip(M - n - (K - l)N)} 



k=0 n=0 
K-l N-l 
logej^ ^ 

fc=0 n=0 



'M-n-kN-1 -, M-n-(K-l)N-l 

E j- E 



K-1JV-1 (A"-1)JV 



i 



M — n — i 



k=0 n=0 i=kN+l 

Appendix V 
Proof of Theorem [5] 

From (I33T ) and (|4TT) it is known that the /?dpc bd and /3dpc zf are given by 

'KN-l N-l 

^dpc-bd = loge ^(M-l)-Kj2^(M-(K-l)N-n) 



PC-ZF 



log e 



. 1=0 
KN-l 



n=0 



^ V>(M - I) - KNip(M - KN + 1) 



1=0 



From the assumption, M = aKN (a > 1) with N > 1, 

/TV-l 



r^- (f [/3dpc-zf] - £ [A>pc-bd]) = # ( V if>(M - (K - 1)N - n) 

N 

= K [tp(M — KN + n) — ip(M — KN + 1)] 

n=2 

AT M-KN+n-1 

= *E E 7 



KNip(M - KN + 1) 



n=2 j=M-KN+l 
N-l 

*E 



JV-i 



M - KN + i 

K(N-i) 
■< (a - 1)KN + % 



JV-l 

E 
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Appendix VI 
Derivation of d46), d47l) , and J48 

From (O with the uniform power allocation, we have 



C DPC (H,P) ^log 2 



P 



I+—HTH 



where T = diag(7i, • • • ,^ K ) (g> InxN- By |I + AB| = |I + BA|, 



P 



C DPC (H,P) log 2 

= KN log 2 P + log 2 
£TiV log 2 P + log 2 



P I + KiV 



rHH 



H 



KN 



-rHH 



KN log 2 P - ifiV log 2 KN + log 2 |r| + log 2 

K 

C DPC (H,P)+iV^log 2 7 fc 



HH 



k=l 



Since the zero-forcing vector v*. n for h*. n is identical to the zero-forcing vector v*. n for n , the 
effective channel gain is given by 



9k,n — b-k,nVk,n — \flk9k,m 

where g^ n = h-k,n^k,n- Thus the ZF sum rate (l20l) can be modified as 

K N 



(73) 



C ZF (H,P) - EE lo S2 U + 



fc=l n=l 



P 
~KN 



lk\9k,n\ 



K N 



1 1 



fc=ln=l V 

* W / 1 

K7Vlog 2 P + Z^log 2 — 



fc=l n=l 
A" 



KN 



lk\9k,n\ 



* C ZP (H,P)+NY J teg 2 ~f k 
fc=i 

Likewise, for BD, = leads to 

Gfc = HfcVfc = v/7fcGfc, 



(74) 
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where G k = HfcV^. Thus, the BD sum rate in (1231) is modified to 

K 



C BD (H,P) - ^log 2 



k=l 



1 + 



P 

~KN 

K 



1 



1 



P KN 
1 



KN 



= KiVlog 2 P + ^log ? -I- 
fc=l 
if 

S i^Vlog.P + ^log 
fc=i 

C BD (H,P)+iV^log 2 7 fc 
fc=i 

Appendix VII 
Decoupling Lemma 

Lemma 3: Let {H J }jL 1 (G C ArxM ) be a set of K-user MIMO broadcast channel matrices with M > 
KN. If F k (k = 1, ■ ■ ■ ,K) is the projection of H fc onto the nullspace of {H.j}jZi (i.e., F fc = H^P" 1 
where P -1 denotes the nullspace of {Hj}^~|), then 



lim 

P^oo 



H fe (A( fe - 1 ))- 1 Hf -F fe F 



k = l,--- ,K, 



(75) 



where A<» = I + £)J =1 Hf Q^H, for k > 1 and A(°) = I. 

Proof: If we let the eigenvector matrix and eigenvalues of YljZi H^QjHj be U and Ai, • • • , \ k -i 
with Xj > 0, then 

(A (fe-l) r l/2 = UAU P 



where 



A = diag 



V yr+AT' 'yiTA7-i / 

As P goes to infinity, A's tend to infinity. Thus, the first k — 1 eigenvalues of A converge to 0. The 
eigenvectors corresponding to the unit eigenvalues span the nullspace {Hj}^~J; i.e., 



lim 

P-»oo 



0. 



This completes the proof. 
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Appendix VIII 
Proof of Theorem [8] 

With the decoupling lemma in Appendix IVIIl the optimization (I65T ) can be decomposed into the two 
optimizations at high SNR: 

K 

C BC (/-*,H,P) ^ max (76) 



where 



UPk)= max log 2 |l + Q fc F fc Ff|. (77) 



At high SNR, Eq. (1771 ) can be asymptotically expressed as an affine approximation [3]: 

ZkW = 5oo,fe(log 2 P k ~ Coo,k) + o(l), (78) 

where 5oo,fc and £oo,A: are determined by the multiplexing gain and power offset. Hence, the optimization 
(1761) is asymptotically equivalent to solve the following: 



K 

max V s u k \og 2 P k . 

This leads the optimal P k = Furthermore, by Theorem 3 of [1], the optimal power allocation is 
asymptotically achieved by 

Q* = ^I, k = l,-..,K. 
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